
简介: 自从引入零点学习是指对训练期间未见的类的实例进行预测的问题。零点学习的一个方法是为模型提供辅助的类信息。此前的工作在很大程度上使用了昂贵的每实例注释或单一的类级描述,但每实例描述很难扩展,单一的类描述可能不够丰富

CV - 计算机视觉 |  ML - 机器学习 |  RL - 强化学习 | NLP 自然语言处理

Subjects: cs.Cv、cs.CL、cs.LG

1.SemSup: Semantic Supervision for Simple and Scalable Zero-shot Generalization



作者:Austin W. Hanjie, Ameet Deshpande, Karthik Narasimhan






Zero-shot learning is the problem of predicting instances over classes not seen during training. One approach to zero-shot learning is providing auxiliary class information to the model. Prior work along this vein have largely used expensive per-instance annotation or singular class-level descriptions, but per-instance descriptions are hard to scale and single class descriptions may not be rich enough. Furthermore, these works have used natural-language descriptions exclusively, simple bi-encoders models, and modality or task-specific methods. These approaches have several limitations: text supervision may not always be available or optimal and bi-encoders may only learn coarse relations between inputs and class descriptions. In this work, we present SemSup, a novel approach that uses (1) a scalable multiple description sampling method which improves performance over single descriptions, (2) alternative description formats such as JSON that are easy to generate and outperform text on certain settings, and (3) hybrid lexical-semantic similarity to leverage fine-grained information in class descriptions. We demonstrate the effectiveness of SemSup across four datasets, two modalities, and three generalization settings. For example, across text and image datasets, SemSup increases unseen class generalization accuracy by 15 points on average compared to the closest baseline.

2.Continual Few-Shot Learning Using HyperTransformers



作者:Max Vladymyrov, Andrey Zhmoginov, Mark Sandler






We focus on the problem of learning without forgetting from multiple tasks arriving sequentially, where each task is defined using a few-shot episode of novel or already seen classes. We approach this problem using the recently published HyperTransformer (HT), a Transformer-based hypernetwork that generates specialized task-specific CNN weights directly from the support set. In order to learn from a continual sequence of tasks, we propose to recursively re-use the generated weights as input to the HT for the next task. This way, the generated CNN weights themselves act as a representation of previously learned tasks, and the HT is trained to update these weights so that the new task can be learned without forgetting past tasks. This approach is different from most continual learning algorithms that typically rely on using replay buffers, weight regularization or task-dependent architectural changes. We demonstrate that our proposed Continual HyperTransformer method equipped with a prototypical loss is capable of learning and retaining knowledge about past tasks for a variety of scenarios, including learning from mini-batches, and task-incremental and class-incremental learning scenarios.

3.Universal Domain Adaptation for Remote Sensing Image Scene Classification



作者:Qingsong Xu, Yilei Shi, Xin Yuan, Xiao Xiang Zhu





       迄今为止,现有的领域适应(DA)方法通常不太适合遥感图像分类的实际DA场景,因为这些方法(如无监督DA)依赖于关于源域和目标域的标签集之间关系的丰富的先验知识,而由于隐私或保密问题,源数据往往无法获得。为此,我们提出了一个实用的通用域适应设置,用于遥感图像场景分类,不需要关于标签集的先验知识。此外,针对源数据不可用的情况,我们提出了一种没有源数据的新型通用域适应方法。该模型的结构分为两部分:源数据生成阶段和模型适应阶段。第一阶段利用源域中的类分离性知识,从预训练的模型中估计出源数据的条件分布,然后合成源数据。有了这个合成的源数据,如果目标样本属于源标签集中的任何类别,就可以对其进行正确的分类,否则就将其标记为 "未知",这就成为一项通用的DA任务。在第二阶段,一个新的可转移权重区分了每个领域的共享和私有标签集,促进了自动发现的共享标签集的适应性,并成功识别了 "未知 "的样本。实证结果表明,无论源数据是否可用,所提出的模型对遥感图像场景分类是有效和实用的。

The domain adaptation (DA) approaches available to date are usually not well suited for practical DA scenarios of remote sensing image classification, since these methods (such as unsupervised DA) rely on rich prior knowledge about the relationship between label sets of source and target domains, and source data are often not accessible due to privacy or confidentiality issues. To this end, we propose a practical universal domain adaptation setting for remote sensing image scene classification that requires no prior knowledge on the label sets. Furthermore, a novel universal domain adaptation method without source data is proposed for cases when the source data is unavailable. The architecture of the model is divided into two parts: the source data generation stage and the model adaptation stage. The first stage estimates the conditional distribution of source data from the pre-trained model using the knowledge of class-separability in the source domain and then synthesizes the source data. With this synthetic source data in hand, it becomes a universal DA task to classify a target sample correctly if it belongs to any category in the source label set, or mark it as "unknown" otherwise. In the second stage, a novel transferable weight that distinguishes the shared and private label sets in each domain promotes the adaptation in the automatically discovered shared label set and recognizes the ``unknown'' samples successfully. Empirical results show that the proposed model is effective and practical for remote sensing image scene classification, regardless of whether the source data is available or not.

机器学习/深度学习 自然语言处理 算法
大规模文本到图像 (T2I) 模型令人难以置信的生成能力已经证明了学习复杂结构和有意义的语义的强大能力。然而,仅仅依靠文本提示并不能充分利用模型学到的知识,尤其是在需要灵活准确的结构控制时。在本文中,我们的目标是“挖掘”出 T2I 模型隐式学习的能力,然后显式地使用它们来更细粒度地控制生成。
129 0
机器学习/深度学习 自然语言处理 物联网
大型语言模型 (LLM)(如 GPT-3 和 ChatGPT)的成功导致开发了许多具有成本效益且易于访问的替代方案,这些替代方案是通过使用特定于任务的数据(例如,ChatDoctor)微调开放访问 LLM 创建的) 或指令数据(例如,Alpaca)。在各种微调方法中,基于适配器的参数高效微调(PEFT)无疑是最吸引人的话题之一
157 0
机器学习/深度学习 编解码 人工智能
我们介绍了多尺度多视图视觉变换器 (MMViT),它将多尺度特征图和多视图编码引入到变换器模型中。我们的模型对输入信号的不同视图进行编码,并构建多个通道分辨率特征阶段
170 0
机器学习/深度学习 人工智能 自然语言处理
具有指令微调的大型语言模型 (LLM) 展示了卓越的生成能力。然而,这些模型是资源密集型的。为了缓解这个问题,我们探索从指令调整的 LLM 中提炼知识到更小的 LLM。为此,我们基于现有指令和新生成的指令精心开发了大量 2.58M 指令集。
133 0
机器学习/深度学习 人工智能 自然语言处理
大规模视觉语言模型(例如 CLIP)学习强大的图像文本表示,这些表示已找到许多应用程序,从零镜头分类到文本到图像生成。尽管如此,它们通过提示解决新的判别任务的能力仍落后于大型语言模型,例如 GPT-3。在这里,我们探索视觉提示工程的想法,通过在图像空间而不是文本中进行编辑来解决分类以外的计算机视觉任务。
129 0
传感器 机器学习/深度学习 自然语言处理
130 0
机器学习/深度学习 存储 自然语言处理
214 0
机器学习/深度学习 自然语言处理 算法
数据驱动是深度学习算法最具标志性的特性之一。ImageNet 的诞生推动了计算机视觉“从大规模数据中学习”的显着趋势。在 ImageNet 上进行预训练以获得丰富的通用表征已被证明有利于各种 2D 视觉任务,并成为 2D 视觉的标准。
165 0
机器学习/深度学习 自然语言处理 算法
192 0
机器学习/深度学习 编解码 自然语言处理
最近关于从姿势图像进行 3D 重建的工作表明,使用深度神经网络直接推断场景级 3D 几何结构而无需迭代优化是可行的,显示出非凡的前景和高效率。
109 0