介绍
计算机视觉是人工智能的一个领域,训练计算机解释和理解视觉世界。利用来自相机、视频和深度学习模型的数字图像,机器可以准确地识别和分类物体,然后对它们看到的东西做出反应。
在过去几年里,深度学习使得计算机视觉领域迅速发展。在这篇文章中,我想讨论计算机视觉中一个叫做分割的特殊任务。尽管研究人员已经提出了许多方法来解决这个问题,但我将讨论一种特殊的架构,即UNET,它使用一个完全卷积的网络模型来完成这项任务。
我们将利用UNET构建Kaggle SCIENCE BOWL 2018 挑战赛的第一解决方案。
先决条件
这篇文章是假设读者已经熟悉机器学习和卷积网络的基本概念。同时,他/她也有一些使用Python和Keras库的ConvNets的工作知识。
什么是市场细分?
分割的目的是将图像的不同部分分割成可感知的相干部分。细分有两种类型:
- 语义分割(基于标记类的像素级预测)
- 实例分割(目标检测和目标识别)
在这篇文章中,我们将主要关注语义分割。
U-NET是什么?
U-Net创建于2015年,是一款专为生物医学图像分割而开发的CNN。目前,U-Net已经成为一种非常流行的用于语义分割的端到端编解码器网络。它有一个独特的上下结构,有一个收缩路径和一个扩展路径。
U-NET 结构
U-Net下采样路径由4个block组成,其层数如下:
3x3 CONV (ReLU +批次标准化和Dropout使用)
3x3 CONV (ReLU +批次标准化和Dropout使用)
2x2 最大池化
当我们沿着这些块往下走时,特征图会翻倍,从64开始,然后是128、256和512。
瓶颈层由2个CONV层、BN和Dropout组成
与下采样相似上采样路径由4个块组成,层数如下:
反卷积层
从特征图中拼接出相应的收缩路径
3x3 CONV (ReLU +BN和Dropout)
3x3 CONV (ReLU +BN和Dropout)
KAGGLE DATA SCIENCE BOWL 2018 CHALLENGE
这项挑战的主要任务是在图像中检测原子核。通过自动化核检测,你可以帮助更快的解锁治疗。识别细胞核是大多数分析的起点,因为人体30万亿个细胞中的大多数都包含一个充满DNA的细胞核,而DNA是给每个细胞编程的遗传密码。识别细胞核使研究人员能够识别样本中的每个细胞,并通过测量细胞对各种治疗的反应,研究人员可以了解潜在的生物学过程。
样本图像,目标和方法
我们将使用U-Net这个专门为分割任务而设计的CNN自动生成图像遮罩
导入所有必要的包和模块
importosimportsysimportrandomimportwarningsimportnumpyasnpimportpandasaspdimportmatplotlib.pyplotaspltfromtqdmimporttqdmfromitertoolsimportchainfromskimage.ioimportimread, imshow, imread_collection, concatenate_imagesfromskimage.transformimportresizefromskimage.morphologyimportlabelfromkeras.modelsimportModel, load_modelfromkeras.layersimportInputfromkeras.layers.coreimportDropout, Lambdafromkeras.layers.convolutionalimportConv2D, Conv2DTransposefromkeras.layers.poolingimportMaxPooling2Dfromkeras.layers.mergeimportconcatenatefromkeras.callbacksimportEarlyStopping, ModelCheckpointfromkerasimportbackendasKimporttensorflowastfIMG_WIDTH=128IMG_HEIGHT=128IMG_CHANNELS=3TRAIN_PATH='./U_NET/train/'TEST_PATH='./U_NET/validation/'warnings.filterwarnings('ignore', category=UserWarning, module='skimage') seed=42random.seed=seednp.random.seed=seed
为训练和测试数据收集我们的文件名
train_ids=next(os.walk(TRAIN_PATH))[1] test_ids=next(os.walk(TEST_PATH))[1]
创建尺寸为128 x 128的图像遮罩(黑色图像)
print('Getting and resizing training images ... ') X_train=np.zeros((len(train_ids), IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS), dtype=np.uint8) Y_train=np.zeros((len(train_ids), IMG_HEIGHT, IMG_WIDTH, 1), dtype=np.bool)#Re-sizingourtrainingimagesto128x128#Notesys.stdoutprintsinfothatcanbeclearedunlikeprint. #UsingTQDMallowsustocreateprogressbarssys.stdout.flush() forn, id_intqdm(enumerate(train_ids), total=len(train_ids)): path=TRAIN_PATH+id_img=imread(path+'/images/'+id_+'.png')[:,:,:IMG_CHANNELS] img=resize(img, (IMG_HEIGHT, IMG_WIDTH), mode='constant', preserve_range=True) X_train[n] =imgmask=np.zeros((IMG_HEIGHT, IMG_WIDTH, 1), dtype=np.bool) #Nowwetakeallmasksassociatedwiththatimageandcombinethemintoonesinglemaskformask_fileinnext(os.walk(path+'/masks/'))[2]: mask_=imread(path+'/masks/'+mask_file) mask_=np.expand_dims(resize(mask_, (IMG_HEIGHT, IMG_WIDTH), mode='constant', preserve_range=True), axis=-1) mask=np.maximum(mask, mask_) #Y_trainisnowoursinglemaskassociatedwithourimageY_train[n] =mask#GetandresizetestimagesX_test=np.zeros((len(test_ids), IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS), dtype=np.uint8) sizes_test= [] print('Getting and resizing test images ... ') sys.stdout.flush() #Hereweresizeourtestimagesforn, id_intqdm(enumerate(test_ids), total=len(test_ids)): path=TEST_PATH+id_img=imread(path+'/images/'+id_+'.png')[:,:,:IMG_CHANNELS] sizes_test.append([img.shape[0], img.shape[1]]) img=resize(img, (IMG_HEIGHT, IMG_WIDTH), mode='constant', preserve_range=True) X_test[n] =imgprint('Done!')
建立U-Net模型
defmy_iou_metric(label, pred): metric_value=tf.py_func(iou_metric_batch, [label, pred], tf.float32) returnmetric_value#BuildU-Netmodel#Notewemakeourlayersvaraiblessothatwecanconcatenateorstack#Thisisrequiredsothatwecanre-createourU-NetModelinputs=Input((IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS)) s=Lambda(lambdax: x/255) (inputs)c1=Conv2D(16, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (s) c1=Dropout(0.1) (c1) c1=Conv2D(16, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c1) p1=MaxPooling2D((2, 2)) (c1)c2=Conv2D(32, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (p1) c2=Dropout(0.1) (c2) c2=Conv2D(32, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c2) p2=MaxPooling2D((2, 2)) (c2)c3=Conv2D(64, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (p2) c3=Dropout(0.2) (c3) c3=Conv2D(64, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c3) p3=MaxPooling2D((2, 2)) (c3)c4=Conv2D(128, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (p3) c4=Dropout(0.2) (c4) c4=Conv2D(128, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c4) p4=MaxPooling2D(pool_size=(2, 2)) (c4)c5=Conv2D(256, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (p4) c5=Dropout(0.3) (c5) c5=Conv2D(256, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c5)u6=Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same') (c5) u6=concatenate([u6, c4]) c6=Conv2D(128, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (u6) c6=Dropout(0.2) (c6) c6=Conv2D(128, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c6)u7=Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same') (c6) u7=concatenate([u7, c3]) c7=Conv2D(64, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (u7) c7=Dropout(0.2) (c7) c7=Conv2D(64, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c7)u8=Conv2DTranspose(32, (2, 2), strides=(2, 2), padding='same') (c7) u8=concatenate([u8, c2]) c8=Conv2D(32, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (u8) c8=Dropout(0.1) (c8) c8=Conv2D(32, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c8)u9=Conv2DTranspose(16, (2, 2), strides=(2, 2), padding='same') (c8) u9=concatenate([u9, c1], axis=3) c9=Conv2D(16, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (u9) c9=Dropout(0.1) (c9) c9=Conv2D(16, (3, 3), activation='elu', kernel_initializer='he_normal', padding='same') (c9)#Noteouroutputiseffectivelyamaskof128x128outputs=Conv2D(1, (1, 1), activation='sigmoid') (c9)model=Model(inputs=[inputs], outputs=[outputs]) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[my_iou_metric]) model.summary()
训练我们的模型
model_path="./nuclei_finder_unet_1.h5"checkpoint=ModelCheckpoint(model_path, monitor="val_loss", mode="min", save_best_only=True, verbose=1) earlystop=EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=1, restore_best_weights=True) #Fitourmodelresults=model.fit(X_train, Y_train, validation_split=0.1, batch_size=16, epochs=10, callbacks=[earlystop, checkpoint])
生成验证数据的预测
#Predictontrainingandvalidationdata#Noteouruseofmean_ioumetrimodel=load_model('./nuclei_finder_unet_1.h5', custom_objects={'my_iou_metric': my_iou_metric}) #thefirst90%wasusedfortrainingpreds_train=model.predict(X_train[:int(X_train.shape[0]*0.9)], verbose=1) #thelast10%usedasvalidationpreds_val=model.predict(X_train[int(X_train.shape[0]*0.9):], verbose=1) #preds_test=model.predict(X_test, verbose=1) #Thresholdpredictionspreds_train_t= (preds_train>0.5).astype(np.uint8) preds_val_t= (preds_val>0.5).astype(np.uint8)
在我们的训练数据上显示我们预测的遮罩
ix=random.randint(0, 602) plt.figure(figsize=(20,20)) #Ouroriginaltrainingimageplt.subplot(131) imshow(X_train[ix]) plt.title("Image") #Ouroriginalcombinedmaskplt.subplot(132) imshow(np.squeeze(Y_train[ix])) plt.title("Mask") #ThemaskourU-Netmodelpredictsplt.subplot(133) imshow(np.squeeze(preds_train_t[ix] >0.5)) plt.title("Predictions") plt.show()
最后这里是完整的代码:
数据集:https://www.kaggle.com/c/data-science-bowl-2018
本文代码:https://github.com/bhaveshgoyal27/mediumblogs/blob/master/U-Net.ipynb