暂时未有相关云产品技术能力~
一个有效但简单的目标分割网络一个高效、准确且简单的基于Transformer的目标分割网络本文创新点:设计了一个无位置编码的层次Transformer编码器使用轻量化的全MLP构成的解码器,融合多层特征,产生输出结果A novel positional-encoding-free and hierarchical Transformer encoder.A lightweight All-MLP decoder design that yields a powerful representation without complex and computationally demanding modules.层次Transformer编码器设计了Mix Transformer encoders (MiT), MiT-B0 to MiT-B5。Overlapped Patch Merging将特征图高宽减半,通道数增加一倍。作用类似于池化,用它进行下采样,实现了多尺度的特征图。讲述下non-overlapping Patch Merging:它无法保持这些patch的局部连续性,因此文中使用K=7,S=4,P=3以及K=3,S=2,P=1的overlapping patch mergingThis process was initially designed to combine non-overlapping image or feature patches. Therefore, it fails to preserve the local continuity around those patches.In our experiments, we set K = 7, S = 4, P = 3 ,and K = 3, S = 2, P = 1 to perform overlapping patch merging to produces features with the same size as the non-overlapping process.高效自注意力在原始的多头自注意力层中,每个头的都是相同的维度:,其中计算复杂度为,这对于高分辨率图像是难以实现的,文中使用缩减比例R来缩减序列(缩减的KV)长度:K是被缩减的序列,将K缩减为维度为的序列,再使用线性层将维度调整为,计算复杂度变为,论文中R分别为[64, 16, 4, 1]。这是segformer的self attention实现:EfficientMultiheadAttention(MultiheadAttention)注意这里的qkv计算实际上是继承的pytorch的def forward(self, x, hw_shape, identity=None): x_q = x if self.sr_ratio > 1: x_kv = nlc_to_nchw(x, hw_shape) x_kv = self.sr(x_kv) x_kv = nchw_to_nlc(x_kv) x_kv = self.norm(x_kv) else: x_kv = x if identity is None: identity = x_q out = self.attn(query=x_q, key=x_kv, value=x_kv, need_weights=False)[0] return identity + self.dropout_layer(self.proj_drop(out))可以看到作者提出来的Efficient Self-Attention本质其实是增加了一个sr_ratio超参,通过sr_ratio来控制KV参数矩阵的尺寸,具体实现是这样的:self.sr_ratio = sr_ratio if sr_ratio > 1: self.sr = Conv2d( in_channels=embed_dims, out_channels=embed_dims, kernel_size=sr_ratio, stride=sr_ratio)MixX-FFNViT使用位置编码(PE)来介绍位置信息。然而,PE的分辨率是固定的。因此,当测试分辨率不同于训练率时,位置代码需要进行插值,这通常会导致精度下降。作者认为,位置编码实际上并不是语义分割的必要条件,Segform引入了Mix-FFN,它通过直接在前馈网络(FFN)中使用3×3Conv传递位置信息。class MixFFN(BaseModule): def __init__(self, embed_dims, feedforward_channels, act_cfg=dict(type='GELU'), ffn_drop=0., dropout_layer=None, init_cfg=None): super(MixFFN, self).__init__(init_cfg) self.embed_dims = embed_dims self.feedforward_channels = feedforward_channels self.act_cfg = act_cfg self.activate = build_activation_layer(act_cfg) in_channels = embed_dims fc1 = Conv2d( in_channels=in_channels, out_channels=feedforward_channels, kernel_size=1, stride=1, bias=True) # 3x3 depth wise conv to provide positional encode information pe_conv = Conv2d( in_channels=feedforward_channels, out_channels=feedforward_channels, kernel_size=3, stride=1, padding=(3 - 1) // 2, bias=True, groups=feedforward_channels) fc2 = Conv2d( in_channels=feedforward_channels, out_channels=in_channels, kernel_size=1, stride=1, bias=True) drop = nn.Dropout(ffn_drop) layers = [fc1, pe_conv, self.activate, drop, fc2, drop] self.layers = Sequential(*layers) self.dropout_layer = build_dropout( dropout_layer) if dropout_layer else torch.nn.Identity() def forward(self, x, hw_shape, identity=None): out = nlc_to_nchw(x, hw_shape) out = self.layers(out) out = nchw_to_nlc(out) if identity is None: identity = x return identity + self.dropout_layer(out)这里注意一下,在transform算法中,激活函数不再是我们所熟悉的Relu,而是Gelu,这个也不是说随便就换上去的。在神经网络的建模过程中,模型很重要的性质就是非线性,同时为了模型泛化能力,需要加入随机正则,例如dropout(随机置一些输出为0,其实也是一种变相的随机非线性激活), 而随机正则与非线性激活是分开的两个事情, 而其实模型的输入是由非线性激活与随机正则两者共同决定的。GELUs正是在激活中引入了随机正则的思想,是一种对神经元输入的概率描述,直观上更符合自然的认识,同时实验效果要比Relus与ELUs都要好。GELU也会为inputs乘以0或者1,但不同于以上的或有明确值或随机,GELU所加的0-1mask的值是随机的,同时是依赖于inputs的分布的。可以理解为:GELU的权值取决于当前的输入input有多大的概率大于其余的inputs因为神经元的输入趋向于正态分布,这么设定使得当输入x减小的时候,输入会有一个更高的概率被dropout掉,这样的激活变换就会随机依赖于输入了。论文也给了一种近似表示:BETR源码:def gelu(input_tensor): cdf = 0.5 * (1.0 + tf.erf(input_tensor / tf.sqrt(2.0))) return input_tesnsor*cdf在transform中确实采用GELU效果会更理想一些如图所示,对于给定的分辨率,使用Mix-FFN的方法明显优于使用位置编码的方法。此外,我们的方法对测试分辨率的差异不那么敏感:当在较低分辨率使用位置编码时,精度下降3.3%。相比之下,当我们使用建议的Mix-FFN时,性能下降仅为0.7%。根据这些结果,我们可以得出结论,使用所提出的Mix-FFN比使用位置编码产生更好和更健壮的编码器。Table 1c shows the results for this experiment. As shown, for a given resolution, our approach using Mix-FFN clearly outperforms using a positional encoding. Moreover, our approach is less sensitive to differences in the test resolution: the accuracy drops 3.3% when using a positional encoding with a lower resolution. In contrast, when we use the proposed Mix-FFN the performance drop is reduced to only 0.7%. From these results, we can conclude using the proposed Mix-FFN produces better and more robust encoders than those using positional encoding.解码器过去三年语义分割从Deeplab系列到PSPNet到DANet等等都是在研究如何设计更好的decoder(encoder一般通过backbone提取)decoder越来越重越来越复杂对于语义分割来说最重要的问题就是如何增大感受野。首先对于CNN encoder来说,有效感受野是比较小且局部的,所以需要一些decoder 的设计来增大有效感受野,比如ASPP里利用了不同大小的空洞卷积来实现这一目的。但是对于Transformer encoder来说,由于 self-attention存在,有效感受野变得非常大,因此decoder 不需要更多操作来提高感受野(作者试了一堆分割头,基本没有提升),下面是deeplab和segformer有效感受野可视化的对比(有效感受野:Understanding the Effective Receptive Field in Deep Convolutional Neural Networks)The ERF of DeepLabv3+ is relatively small even at Stage-4, the deepest stage.SegFormer’s encoder naturally produces local attentions which resemble convolutions at lower stages, while able to output highly non-local attentions that effectively capture contexts at Stage-4.As shown with the zoom-in patches in Figure 3, the ERF of the MLP head (blue box) differs from Stage-4 (red box) with a significant stronger local attention besides the non-local attention.由于自注意力机制的存在,segformer encoder阶段感受野就足够大了,所以decoder不需要很重的headclass SegformerHead(BaseDecodeHead): def __init__(self, interpolate_mode='bilinear', **kwargs): super().__init__(input_transform='multiple_select', **kwargs) self.interpolate_mode = interpolate_mode num_inputs = len(self.in_channels) assert num_inputs == len(self.in_index) self.convs = nn.ModuleList() for i in range(num_inputs): self.convs.append( ConvModule( in_channels=self.in_channels[i], out_channels=self.channels, kernel_size=1, stride=1, norm_cfg=self.norm_cfg, act_cfg=self.act_cfg)) self.fusion_conv = ConvModule( in_channels=self.channels * num_inputs, out_channels=self.channels, kernel_size=1, norm_cfg=self.norm_cfg) def forward(self, inputs): # Receive 4 stage backbone feature map: 1/4, 1/8, 1/16, 1/32 inputs = self._transform_inputs(inputs) outs = [] for idx in range(len(inputs)): x = inputs[idx] conv = self.convs[idx] outs.append( resize( input=conv(x), size=inputs[0].shape[2:], mode=self.interpolate_mode, align_corners=self.align_corners)) out = self.fusion_conv(torch.cat(outs, dim=1)) out = self.cls_seg(out) return out实验结果SegFormer一作的思考:对于语义分割,特征提取非常重要,Transformer已经在分类上证明了比CNN更强大的特征提取能力。但是分类和分割还是有一定的GAP, 因此如何设计对分割友好的更好的Transformer结构,还可以继续研究。有了很好的特征,decoder该如何设计才能进一步提高性能。这里用了一个很简单的MLP decoder取得了不错的效果,而传统的ASPP之类的decoder 在Transformer的基础上基本上没有帮助,未来如何针对性的设计更好的decoder也比较值得探索。关于tf的思考:downsample、position encoding也都开始倾向于用conv了,退化回CNN架构的设计方式(SwinT使用了CNN的local和hierarchical思想)。到最后,Vision Transformer相比于CNN,可能只有local self-attention是有进步意义的。
设置环境变量E:\Dev-Cpp\MinGW64\bin(这个是dev MinGW的)D:\env\gcc\gcc-12.1\ucrt\bin(这个是自己下载MinGW gcc的)如果使用的是第三种方法,即使用cmake,那么在安装cmake时注意选择添加到环境变量当所有环境变量设置好后,在命令窗口用如下命令测试:gcc -v cmakeVScode插件VScode要安装以下几个插件:Chinese (Simplified) (简体中文) Language Pack for Visual Studio CodeC/C++Better C++ SyntaxCMakeCMake ToolsCode Runner使用VScode编译C/C++有三种方式,这里分别进行介绍:第一种和第二种方式适合临时编译,也就是单个文件编译一(使用安装C/C++后带的按钮)也就是使用图片里第一个和第三个按钮进行编译运行和调试建立一个文件夹用于放cpp文件,然后用vscode打开,并新建一个文件夹 .vscode,里面有两个json文件:launch.json,tasks.json对于launch.json:{ // 使用 IntelliSense 了解相关属性。 // 悬停以查看现有属性的描述。 // 欲了解更多信息,请访问: https://go.microsoft.com/fwlink/?linkid=830387 "version": "0.2.0", "configurations": [] }对于tasks.json:{ "tasks": [ { "type": "cppbuild", "label": "C/C++: g++.exe 生成活动文件", "command": "E:\Dev-Cpp\MinGW64\bin\g++.exe", "args": [ "-fdiagnostics-color=always", "-g", "-std=c++11", "${file}", "-o", "${fileDirname}\${fileBasenameNoExtension}.exe" ], "options": { "cwd": "${fileDirname}" }, "problemMatcher": [ "$gcc" ], "group": { "kind": "build", "isDefault": true }, "detail": "调试器生成的任务。" } ], "version": "2.0.0" }注意里面路径改成自己的,只需要改"command"那个二(使用run code)这种方法就不能用右上角第一个和第三个选项进行编译和调试了,编译运行要用run code,调试要在左边点击调试按钮,然后点击运行和调试后面的绿色按钮调试设置方法和方法一的区别在于json文件对于launch.json文件:// https://github.com/Microsoft/vscode-cpptools/blob/master/launch.md { "version": "0.2.0", "configurations": [{ "name": "(gdb) Launch", // 配置名称,将会在启动配置的下拉菜单中显示 "type": "cppdbg", // 配置类型,cppdbg对应cpptools提供的调试功能;可以认为此处只能是cppdbg "request": "launch", // 请求配置类型,可以为launch(启动)或attach(附加) "program": "${fileDirname}/${fileBasenameNoExtension}.exe", // 将要进行调试的程序的路径 "args": [], // 程序调试时传递给程序的命令行参数,一般设为空即可 "stopAtEntry": false, // 设为true时程序将暂停在程序入口处,相当于在main上打断点 "cwd": "${workspaceFolder}", // 调试程序时的工作目录,此为工作区文件夹;改成${fileDirname}可变为文件所在目录 "environment": [], // 环境变量 "externalConsole": false, // 为true时使用单独的cmd窗口,与其它IDE一致;18年10月后设为false可调用VSC内置终端 "internalConsoleOptions": "neverOpen", // 如果不设为neverOpen,调试时会跳到“调试控制台”选项卡,你应该不需要对gdb手动输命令吧? "MIMode": "gdb", // 指定连接的调试器,可以为gdb或lldb。但我没试过lldb "miDebuggerPath": "gdb.exe", // 调试器路径,Windows下后缀不能省略,Linux下则不要 "setupCommands": [ { // 模板自带,好像可以更好地显示STL容器的内容,具体作用自行Google "description": "Enable pretty-printing for gdb", "text": "-enable-pretty-printing", "ignoreFailures": false } ], "preLaunchTask": "Compile" // 调试会话开始前执行的任务,一般为编译程序。与tasks.json的label相对应 }] }对于setting文件:{ "files.defaultLanguage": "c", // ctrl+N新建文件后默认的语言 "editor.formatOnType": true, // 输入分号(C/C++的语句结束标识)后自动格式化当前这一行的代码 "editor.suggest.snippetsPreventQuickSuggestions": false, // clangd的snippets有很多的跳转点,不用这个就必须手动触发Intellisense了 "editor.acceptSuggestionOnEnter": "off", // 我个人的习惯,按回车时一定是真正的换行,只有tab才会接受Intellisense // "editor.snippetSuggestions": "top", // (可选)snippets显示在补全列表顶端,默认是inline "code-runner.runInTerminal": true, // 设置成false会在“输出”中输出,无法输入 "code-runner.executorMap": { "c": "cd $dir && gcc '$fileName' -o '$fileNameWithoutExt.exe' -Wall -g -O2 -static-libgcc -std=c11 -fexec-charset=GBK && &'$dir$fileNameWithoutExt'", "cpp": "cd $dir && g++ '$fileName' -o '$fileNameWithoutExt.exe' -Wall -g -O2 -static-libgcc -std=c++11 -fexec-charset=GBK && &'$dir$fileNameWithoutExt'" // "c": "cd $dir && gcc $fileName -o $fileNameWithoutExt.exe -Wall -g -O2 -static-libgcc -std=c11 -fexec-charset=GBK && $dir$fileNameWithoutExt", // "cpp": "cd $dir && g++ $fileName -o $fileNameWithoutExt.exe -Wall -g -O2 -static-libgcc -std=c++17 -fexec-charset=GBK && $dir$fileNameWithoutExt" }, // 右键run code时运行的命令;未注释的仅适用于PowerShell(Win10默认),文件名中有空格也可以编译运行;注释掉的适用于cmd(win7默认),PS和bash也能用,但文件名中有空格时无法运行 "code-runner.saveFileBeforeRun": true, // run code前保存 "code-runner.preserveFocus": true, // 若为false,run code后光标会聚焦到终端上。如果需要频繁输入数据可设为false "code-runner.clearPreviousOutput": false, // 每次run code前清空属于code runner的终端消息,默认false "code-runner.ignoreSelection": true, // 默认为false,效果是鼠标选中一块代码后可以单独执行,但C是编译型语言,不适合这样用 "C_Cpp.clang_format_sortIncludes": true, "files.associations": { "array": "cpp", "atomic": "cpp", "*.tcc": "cpp", "cctype": "cpp", "clocale": "cpp", "cmath": "cpp", "cstdarg": "cpp", "cstddef": "cpp", "cstdint": "cpp", "cstdio": "cpp", "cstdlib": "cpp", "cwchar": "cpp", "cwctype": "cpp", "deque": "cpp", "unordered_map": "cpp", "vector": "cpp", "exception": "cpp", "algorithm": "cpp", "memory": "cpp", "memory_resource": "cpp", "optional": "cpp", "string": "cpp", "string_view": "cpp", "system_error": "cpp", "tuple": "cpp", "type_traits": "cpp", "utility": "cpp", "fstream": "cpp", "initializer_list": "cpp", "iosfwd": "cpp", "iostream": "cpp", "istream": "cpp", "limits": "cpp", "new": "cpp", "ostream": "cpp", "sstream": "cpp", "stdexcept": "cpp", "streambuf": "cpp", "typeinfo": "cpp" }, // 格式化时调整include的顺序(按字母排序) }这里注意"code-runner.executorMap"里面是编译和调试的命令,里面设置了版本是c++11,如果使用的是dev带的MinGW则不能用c++17,会报错:g++: 错误:unrecognized command line option ‘-std=c++17’,这是因为gcc版本不够对于tasks.json文件:// https://code.visualstudio.com/docs/editor/tasks { "version": "2.0.0", "tasks": [{ "label": "Compile", // 任务名称,与launch.json的preLaunchTask相对应 "command": "g++", // 要使用的编译器,C++用g++ "args": [ "${file}", "-o", // 指定输出文件名,不加该参数则默认输出a.exe,Linux下默认a.out "${fileDirname}/${fileBasenameNoExtension}.exe", "-g", // 生成和调试有关的信息 "-Wall", // 开启额外警告 "-static-libgcc", // 静态链接libgcc,一般都会加上 "-fexec-charset=GBK", // 生成的程序使用GBK编码,不加这一条会导致Win下输出中文乱码 // "-std=c11", // C++最新标准为c++17,或根据自己的需要进行修改 ], // 编译的命令,其实相当于VSC帮你在终端中输了这些东西 "type": "process", // process是vsc把预定义变量和转义解析后直接全部传给command;shell相当于先打开shell再输入命令,所以args还会经过shell再解析一遍 "group": { "kind": "build", "isDefault": true // 不为true时ctrl shift B就要手动选择了 }, "presentation": { "echo": true, "reveal": "always", // 执行任务时是否跳转到终端面板,可以为always,silent,never。具体参见VSC的文档 "focus": false, // 设为true后可以使执行task时焦点聚集在终端,但对编译C/C++来说,设为true没有意义 "panel": "shared" // 不同的文件的编译信息共享一个终端面板 }, // "problemMatcher":"$gcc" // 此选项可以捕捉编译时终端里的报错信息;但因为有Lint,再开这个可能有双重报错 }] }tasks.json运行后可能会有变化,我的变成了如下:{ "version": "2.0.0", "tasks": [ { "label": "Compile", "command": "g++", "args": [ "${file}", "-o", "${fileDirname}/${fileBasenameNoExtension}.exe", "-g", "-Wall", "-static-libgcc", "-fexec-charset=GBK" ], "type": "process", "group": "build", "presentation": { "echo": true, "reveal": "always", "focus": false, "panel": "shared" } }, { "type": "cppbuild", "label": "C/C++: cpp.exe 生成活动文件", "command": "E:\Dev-Cpp\MinGW64\bin\cpp.exe", "args": [ "-fdiagnostics-color=always", "-g", "${file}", "-o", "${fileDirname}\${fileBasenameNoExtension}.exe" ], "options": { "cwd": "${fileDirname}" }, "problemMatcher": [ "$gcc" ], "group": { "kind": "build", "isDefault": true }, "detail": "调试器生成的任务。" } ] }三(使用gcc)这种也分两种,第一种也是介绍适用于单个文件编译的单文件编译创建CMakeLists.txt内容参考官网教程cmake.org/cmake/help/…cmake.org/cmake/help/…里面第二行是项目名,第三行是添加源文件然后ctrl+p,再输入>选择第一项配置完成左下角会有变化点CMake选择Debug或者其他点GCC可以选择编译器点Build进行编译点三角进行运行多文件编译在上一种方法中,因为是自己写的,所以适用于单文件,或者你熟悉cmake,而下面是利用项目模板github.com/Codesire-De…备用:gitee.com/oi_dzf/Temp…把仓库拷贝到本地,然后改成自己需要的名字第一步是把git文件夹删掉,然后重新初始化仓库。(或者在GitHub初始化一个新的仓库,然后克隆到本地)在这个文件夹里已经写好了一个CMakeList,就用它来编译整个工程只要在src、test、example等文件夹写自己的cpp文件就可以了,每当创建或删除一个文件,要记的在下方点一下cmake,第一次会让选择一个编译器,然后编译模式,常用的是Debug和Release,选Debug,这是cmake的工作机制,他要重新扫描你的项目文件下面有个all:Default build target,是想编译哪个目标,选择刚刚写的a,然后点下方的Build第一次用可能有这个错误,不用管Build之后,可执行文件就会在build/src下了美化最后说一下美化,尤其第一个比较建议第一个是彩虹括号,第二个是括号的连线这个让光标更顺滑这个让光标闪烁是淡入淡出的
语义分割中IOU损失(PyTorch实现)IOU是预测与真值的交集/预测与真值的并集预测与真值的交集/预测与真值的并集预测与真值的交集/预测与真值的并集语义分割中的IOU是:预测与真值的交集上每个像素点之和/预测与真值的并集每个像素点之和预测与真值的交集上每个像素点之和/预测与真值的并集每个像素点之和预测与真值的交集上每个像素点之和/预测与真值的并集每个像素点之和损失函数就是1−IOU1-IOU1−IOU或者−IOU-IOU−IOU首先是我在论文代码中遇到的一种实现方式:class IoU_loss(torch.nn.Module): def __init__(self): super(IoU_loss, self).__init__() def forward(self, pred, target): b = pred.shape[0] IoU = 0.0 for i in range(0, b): #compute the IoU of the foreground Iand1 = torch.sum(target[i, :, :, :]*pred[i, :, :, :]) Ior1 = torch.sum(target[i, :, :, :]) + torch.sum(pred[i, :, :, :])-Iand1 IoU1 = Iand1/(Ior1 + 1e-5) #IoU loss is (1-IoU1) IoU = IoU + (1-IoU1) return IoU/b这种方式是每张预测图像分别计算,在每张预测图中(注意输入的预测图要提前手动sigmoid进行激活):第一步:预测值与真值相乘得到交集,然后求和得到交集的像素点之和第二步:预测值像素点之和+真值像素点之和-交集像素点之和第三步:交集/并集,不过分母加上1e-5,防止为0第四步:1-IOU最后,求所有预测图IOU损失的均值第二篇论文代码看到的def iou(pred, mask, epoch, epsilon=1) pred = torch.sigmoid(pred) inter = ((pred * mask) * weit).sum(dim=(2, 3)) union = ((pred + mask) * weit).sum(dim=(2, 3)) wiou = 1 - (inter + epsilon) / (union - inter + epsilon) return wiou.mean()这种方法是一次性计算所有预测图像的:第一步:计算交集,预测图和真值相乘,然后乘weit权重,再对第二三维度求和,也就是图像高宽第二步:计算并集(其实不是并集,多了一块交集),预测图+真值,再乘weit权重,再对第二三维度求和第三步:计算iou损失,其中epsilon是1,是一个小的数,防止为0,不过我感觉这个不小了,所以我都是把epsilon替换成在分母加1e-5weit就是用来对部分像素加权的,上一种方法不好加权
问题:Importerror: libgl.so.1: cannot open shared object file: no such file or directory[Solved] Importerror: libgl.so.1: cannot open shared object file: no such file or directory - ItsMyCode这个问题是在安装opencv后使用时出现的,上述提供了四种解决方法,这里进行翻译及简要概况:1. 安装CV2依赖包apt-get update apt-get install ffmpeg libsm6 libxext6 -ySolution 1: Install cv2 dependencies The easier way to fix the issue is you can update the packages and install the additional dependencies that are required for cv2 to run properly.These dependencies will mostly be present in your local machine and hence the application runs without any issue when you perform a docker build using python based images you will get this error.Just add the below lines into your DockerFile to fix the issue. This will ensure to update the packages and install the additional packages which are required to run cv2.2. 安装opencv-python包apt-get update && apt-get install -y python3-opencv pip install opencv-pythonSolution 2: Install python3-opencv packageIf you don’t want to manually install the dependency, the better way to resolve the issue is to install the python3-opencvThis will ensure that all the related system dependencies are installed correctly while building the docker containers.Add the below lines into your DockerFile to install the python3-opencv and then install the other packages which are there in requirements.txt.3. opencv-python-headlessapt-get update && apt-get install -y opencv-python-headless pip install opencv-python-headlessSolution 3: Install opencv-python-headlessInstead of opencv-python, you could install opencv-python-headless which includes a a precompiled binary wheel with no external dependencies (other than numpy), and is intended for headless environments like Docker.When compared to python3-opencv this is a much more lightweight package and reduces the docker image size by 700MB.4. 安装libgl1apt-get update && apt-get install libgl1Solution 4: Install only the dependency (libgl1)All the above solutions will install the cv2 dependencies and thus increasing the image size. If you do not want to increase the image size then you can resolve the issue by installing the libgl1 dependency as shown below. This is not a recommended solution but it does work if you are just getting the Importerror: libgl.so.1总结我们可以通过安装cv2所需的附加依赖项来解决这个错误,或者我们可以只使用其中一个包,如python3-opencv, opencv-python-headless,它将安装所有相关的依赖项并解决错误。We can resolve this error by installing the additional dependencies which are required by cv2 or we can just use the one of the packages such as python3-opencv, opencv-python-headless which will install all the related dependencies and resolving the error.
binary_cross_entropy和binary_cross_entropy_with_logits都是来自torch.nn.functional的函数,首先对比官方文档对它们的区别:函数名解释binary_cross_entropyFunction that measures the Binary Cross Entropy between the target and the outputbinary_cross_entropy_with_logitsFunction that measures Binary Cross Entropy between target and output logits区别只在于这个logits,那么这个logits是什么意思呢?以下是从网络上找到的一个答案:有一个(类)损失函数名字中带了with_logits. 而这里的logits指的是,该损失函数已经内部自带了计算logit的操作,无需在传入给这个loss函数之前手动使用sigmoid/softmax将之前网络的输入映射到[0,1]之间再看看官方给的示例代码:binary_cross_entropy:input = torch.randn((3, 2), requires_grad=True) target = torch.rand((3, 2), requires_grad=False) loss = F.binary_cross_entropy(F.sigmoid(input), target) loss.backward() # input is tensor([[-0.5474, 0.2197], # [-0.1033, -1.3856], # [-0.2582, -0.1918]], requires_grad=True) # target is tensor([[0.7867, 0.5643], # [0.2240, 0.8263], # [0.3244, 0.2778]]) # loss is tensor(0.8196, grad_fn=<BinaryCrossEntropyBackward>)binary_cross_entropy_with_logits:input = torch.randn(3, requires_grad=True) target = torch.empty(3).random_(2) loss = F.binary_cross_entropy_with_logits(input, target) loss.backward() # input is tensor([ 1.3210, -0.0636, 0.8165], requires_grad=True) # target is tensor([0., 1., 1.]) # loss is tensor(0.8830, grad_fn=<BinaryCrossEntropyWithLogitsBackward>)的确binary_cross_entropy_with_logits不需要sigmoid函数了。事实上,官方是推荐使用函数带有with_logits的,解释是This loss combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.翻译一下就是说将sigmoid层和binaray_cross_entropy合在一起计算比分开依次计算有更好的数值稳定性,这主要是运用了log-sum-exp技巧。那么这个log-sum-exp主要就是讲如何防止数值计算溢出的问题:l o g s u m e x p ( x 1 , x 2 , . . . , x n ) = l o g ( ∑ i = 1 n e x i ) logsumexp(x_1,x_2,...,x_n) = log(\sum_{i=1}^{n}e^{x_i}) logsumexp(x1,x2,...,xn)=log(i=1∑nexi)针对上述式子,如果 x i x_i xi很大,那么 e x i e^{x_i} exi很有可能会溢出,为了避免这样的问题,上式可以进行如下变换:l o g ( ∑ i = 1 n e x i ) = l o g ( e c ∑ i = 1 n e x i − c ) = c l o g e + l o g ( ∑ i = 1 n e x i − c ) log(\sum_{i=1}^{n}e^{x_i})=log(e^c\sum_{i=1}^{n}e^{x_i-c})=cloge+log(\sum_{i=1}^{n}e^{x_i-c}) log(i=1∑nexi)=log(eci=1∑nexi−c)=cloge+log(i=1∑nexi−c)于是乎,这样就可以避免数据溢出了。
2022年12月