CV - 计算机视觉 | ML - 机器学习 | RL - 强化学习 | NLP 自然语言处理
Subjects: cs.CL、cs.AI、cs.CV
1.BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
标题:BioGPT:用于生物医学文本生成和挖掘的生成性预训练转化器
作者:Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, Tie-Yan Liu
文章链接:https://arxiv.org/abs/2210.10341v2
项目代码:https://github.com/microsoft/BioGPT
摘要:
预训练的语言模型在生物医学领域引起了越来越多的关注,这是受其在一般自然语言领域的巨大成功的启发。在一般语言领域的预训练语言模型的两个主要分支,即BERT(及其变体)和GPT(及其变体)中,第一个分支已经在生物医学领域得到了广泛的研究,如BioBERT和PubMedBERT。虽然它们在各种鉴别性的下游生物医学任务上取得了巨大的成功,但由于缺乏生成能力,制约了它们的应用范围。在本文中,我们提出了BioGPT,一个针对特定领域的生成性转化器语言模型,在大规模的生物医学文献上进行了预训练。我们在六个生物医学NLP任务上评估了BioGPT,并证明我们的模型在大多数任务上优于以前的模型。特别是,我们在BC5CDR、KD-DTI和DDI端到端关系提取任务上分别得到44.98%、38.42%和40.76%的F1得分,在PubMedQA上得到78.2%的准确率,创造了一个新的记录。我们较大的模型BioGPT-Large在PubMedQA上达到了81.0%。我们关于文本生成的案例研究进一步证明了BioGPT在生物医学文献上的优势,即为生物医学术语生成流畅的描述。
Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain. Among the two main branches of pre-trained language models in the general language domain, i.e., BERT (and its variants) and GPT (and its variants), the first one has been extensively studied in the biomedical domain, such as BioBERT and PubMedBERT. While they have achieved great success on a variety of discriminative downstream biomedical tasks, the lack of generation ability constrains their application scope. In this paper, we propose BioGPT, a domain-specific generative Transformer language model pre-trained on large scale biomedical literature. We evaluate BioGPT on six biomedical NLP tasks and demonstrate that our model outperforms previous models on most tasks. Especially, we get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks respectively, and 78.2% accuracy on PubMedQA, creating a new record. Our larger model BioGPT-Large achieves 81.0% on PubMedQA. Our case study on text generation further demonstrates the advantage of BioGPT on biomedical literature to generate fluent descriptions for biomedical terms. Code is available at this https URL.
2.Multimodal Chain-of-Thought Reasoning in Language Models
标题:语言模型中的多模态思维链推理
作者:Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, Alex Smola
文章链接:https://arxiv.org/abs/2302.00923v1
项目代码:https://github.com/amazon-science/mm-cot
摘要:
大型语言模型(LLMs)通过利用思维链(CoT)提示,生成中间推理链作为推断答案的依据,在复杂推理上表现出令人印象深刻的性能。然而,现有的CoT研究大多是在语言模式下与LLMs隔离,LLMs很难部署。为了在多模态中引出CoT推理,一个可能的解决方案是通过融合视觉和语言特征来微调小的语言模型来进行CoT推理。关键的挑战是,这些语言模型往往会产生幻觉推理链,误导答案推理。为了减轻这种错误的影响,我们提出了多模态CoT,在一个解耦的训练框架中加入了视觉特征。该框架将原理生成和答案推理分成两个阶段。通过将视觉特征纳入这两个阶段,该模型能够生成有助于答案推理的有效理由。通过Multimodal-CoT,我们的模型在10亿个参数下比以前最先进的LLM(GPT-3.5)在ScienceQA基准上的表现高出16%(75.17%->91.68%),甚至超过了人类的表现。
Large language models (LLMs) have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer. However, existing CoT studies are mostly isolated in the language modality with LLMs, where LLMs are hard to deploy. To elicit CoT reasoning in multimodality, a possible solution is to fine-tune small language models by fusing the vision and language features to perform CoT reasoning. The key challenge is that those language models tend to generate hallucinated reasoning chains that mislead the answer inference. To mitigate the effect of such mistakes, we propose Multimodal-CoT that incorporates vision features in a decoupled training framework. The framework separates the rationale generation and answer inference into two stages. By incorporating the vision features in both stages, the model is able to generate effective rationales that contribute to answer inference. With Multimodal-CoT, our model under 1 billion parameters outperforms the previous state-of-the-art LLM (GPT-3.5) by 16% (75.17%->91.68%) on the ScienceQA benchmark and even surpasses human performance. Code is publicly available at https://github.com/amazon-science/mm-cot.
3.Semantic Coherence Markers for the Early Diagnosis of the Alzheimer Disease
标题:用于早期诊断阿尔茨海默病的语义连贯性标志物
作者:Davide Colla, Matteo Delsanto, Marco Agosto, Benedetto Vitiello, Daniele Paolo Radicioni
文章链接:https://arxiv.org/abs/2302.01025v1
项目代码:https://github.com/davidecolla/semantic_coherence_markers
摘要:
在这项工作中,我们探讨了如何利用语言模型来分析语言,并通过plexity度量来区分精神障碍者和健康人。复杂度最初被认为是一种信息论的衡量标准,用来评估一个给定的语言模型在多大程度上适合预测一个文本序列,或者说,一个词序列在多大程度上适合一个特定的语言模型。我们对公开的数据进行了广泛的实验,并采用了不同的语言模型,如N-grams,从2-grams到5-grams,以及GPT-2,一种基于转化器的语言模型。我们研究了复杂度分数是否可用于区分健康受试者和阿尔茨海默病(AD)受试者的成绩单。我们表现最好的模型在对阿尔茨海默病患者和对照组受试者进行分类时达到了完全的准确性和F分数(精度/特异性和召回/敏感性均为1.00)。这些结果表明,迷惑性可以是一个有价值的分析指标,有可能应用于支持精神障碍症状的早期诊断。
In this work we explore how language models can be employed to analyze language and discriminate between mentally impaired and healthy subjects through the perplexity metric. Perplexity was originally conceived as an information-theoretic measure to assess how much a given language model is suited to predict a text sequence or, equivalently, how much a word sequence fits into a specific language model. We carried out an extensive experimentation with the publicly available data, and employed language models as diverse as N-grams, from 2-grams to 5-grams, and GPT-2, a transformer-based language model. We investigated whether perplexity scores may be used to discriminate between the transcripts of healthy subjects and subjects suffering from Alzheimer Disease (AD). Our best performing models achieved full accuracy and F-score (1.00 in both precision/specificity and recall/sensitivity) in categorizing subjects from both the AD class and control subjects. These results suggest that perplexity can be a valuable analytical metrics with potential application to supporting early diagnosis of symptoms of mental disorders.