论文 6:Siamese Image Modeling for Self-Supervised Vision Representation Learning
- 作者:Chenxin Tao 等人
- 论文地址:https://arxiv.org/abs/2206.01204
摘要:研究者提出了 Siamese Image Modeling(SIM),该方法通过一张遮盖的增强视图来预测相同图像的另一张增强视图的密集特征表示。为了达到这个目标,SIM 采用了孪生网络结构,该结构包含 online 和 target 两个分支。Online 分支首先将第一张遮盖视图映射到特征空间,然后基于第一张图的特征和第一、二张图的相对位置坐标来预测第二张图的特征;Target 分支则负责将第二张图映射到特征空间来获得目标特征。
通过这种方式,SIM 能够分别在线性分类任务上和 ID 方法持平,以及在检测任务上和 MIM 方法持平,研究者进一步发现即便没有全局的损失函数,SIM 也能给出很好的线性分类表现。
ID、 MIM 和 SIM 框架的比较。
Siamese Image Modeling 概览。
ViT-B/16 上 SIM 与其他方法的结果比较。
推荐:自监督学习如何兼顾语义对齐与空间分辨能力?清华、商汤提出「SIM」方法。
论文 7:FlowBot3D: Learning 3D Articulation Flow to Manipulate Articulated Objects
- 作者:Ben Eisner 等人
- 论文地址:https://arxiv.org/pdf/2205.04382.pdf
摘要:最近,CMU 机器人学院 David Held 教授 R-PAD 实验室的两名学生 Ben Eisner 和 Harry Zhang 在操纵复杂的关节物体方面取得了突破,并推出了基于 3D 神经网络的 FlowBot 3D,一种有效表达和预测关节物体部分运动轨迹的算法,如日常家具。该算法包含两个部分。
第一个部分是感知部分,这个部分使用 3D 深度神经网络从被操纵家具物体的点云数据中预测三维瞬时运动轨迹。算法的第二个部分是策略部分,它使用预测得到的 3D Articulated Flow 来选择机器人的下一个动作。
两者都在模拟器中完全学习,可以直接在现实世界中实现,无需重新训练或调整。在 FlowBot 3D 算法的帮助下,机器人可以像人类一样随意操纵日常家具等关节物体。
FlowBot 3D 的两个模块。
打开冰箱门。
打开马桶盖。
推荐:CMU 发表新型灵巧机器人算法,准确学习日常家具的操纵方法。
ArXiv Weekly Radiostation
机器之心联合由楚航、罗若天发起的ArXiv Weekly Radiostation,在 7 Papers 的基础上,精选本周更多重要论文,包括NLP、CV、ML领域各10篇精选,并提供音频形式的论文摘要简介,详情如下:
10 NLP Papers音频:00:0022:27
本周 10 篇 NLP 精选论文是:
1. Unsupervised Key Event Detection from Massive Text Corpora. (from Jiawei Han)
2. Beyond Opinion Mining: Summarizing Opinions of Customer Reviews. (from Bing Liu)
3. Words are all you need? Capturing human sensory similarity with textual descriptors. (from Thomas L. Griffiths)
4. Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos. (from Alexander Waibel)
5. Plumber: A Modular Framework to Create Information Extraction Pipelines. (from Sören Auer)
6. LegoNN: Building Modular Encoder-Decoder Models. (from Abdelrahman Mohamed)
7. Latent Topology Induction for Understanding Contextualized Representations. (from Mirella Lapata)
8. Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future. (from Bonnie Webber)
9. Topic-Aware Evaluation and Transformer Methods for Topic-Controllable Summarization. (from Grigorios Tsoumakas)
10. Factuality Enhanced Language Models for Open-Ended Text Generation. (from Bryan Catanzaro)
本周 10 篇 CV 精选论文是:1. PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images. (from Xiangyu Zhang, Jian Sun)2. Revisiting the "Video" in Video-Language Understanding. (from Li Fei-Fei)3. PrivHAR: Recognizing Human Actions From Privacy-preserving Lens. (from Li Fei-Fei)4. Compositional Visual Generation with Composable Diffusion Models. (from Antonio Torralba, Joshua B. Tenenbaum)5. Polymorphic-GAN: Generating Aligned Samples across Multiple Domains with Learned Morph Maps. (from Antonio Torralba)6. Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval. (from Shih-Fu Chang)7. Beyond RGB: Scene-Property Synthesis with Neural Radiance Fields. (from Martial Hebert)8. Generating Long Videos of Dynamic Scenes. (from Alexei A. Efros)9. STIP: A SpatioTemporal Information-Preserving and Perception-Augmented Model for High-Resolution Video Prediction. (from Wen Gao)10. Hierarchical Similarity Learning for Aliasing Suppression Image Super-Resolution. (from Wen Gao)
本周 10 篇 ML 精选论文是:
1. Schema-Guided Event Graph Completion. (from Jiawei Han)2. BaCaDI: Bayesian Causal Discovery with Unknown Interventions. (from Bernhard Schölkopf, Andreas Krause)3. Causal Discovery in Heterogeneous Environments Under the Sparse Mechanism Shift Hypothesis. (from Bernhard Schölkopf)4. Rethinking and Scaling Up Graph Contrastive Learning: An Extremely Efficient Approach with Group Discrimination. (from Philip S. Yu)5. DORA: Exploring outlier representations in Deep Neural Networks. (from Klaus-Robert Müller)6. Imitating Past Successes can be Very Suboptimal. (from Sergey Levine, Ruslan Salakhutdinov)7. Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks. (from Shuicheng Yan)8. From "Where" to "What": Towards Human-Understandable Explanations through Concept Relevance Propagation. (from Thomas Wiegand)9. Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning. (from Aaron Courville)10. Beyond Tabula Rasa: Reincarnating Reinforcement Learning. (from Aaron Courville, Marc G. Bellemare)