论文推荐:大型语言模型能自我解释吗?

本文涉及的产品
实时数仓Hologres,5000CU*H 100GB 3个月
实时计算 Flink 版,5000CU*H 3个月
智能开放搜索 OpenSearch行业算法版,1GB 20LCU 1个月
简介: 这篇论文的研究主要贡献是对LLM生成解释的优缺点进行了调查。详细介绍了两种方法,一种是做出预测,然后解释它,另一种是产生解释,然后用它来做出预测。

最近的研究发现,即使LLM是在特定数据上训练的,也不能认识到训练的知识与推理上下文之间的联系。

因此一些人认为之为“X链”的方法非常重要。因为要求LLM将任务分解为思维链时,LLM在解决任务的同时检索所训练的现有知识方面表现更好。

LLM有没有能力回答问题,并提供一个解释如何得出结论。或者收到提示词后,LLM有没有能力分解他们的答案?

这篇论文使用两种方法来测试LLM的能力:

做出预测然后解释,或者产生一个解释,并用它来做出预测。

方法对比

思想链生成已被证明是一种较为敏捷的快速工程技术,特别是思想链还有很多优化的变体。思想链对于生成准确的答案是有效的,特别是对于复杂的推理任务,如解决数学问题或复杂的推理任务。

LLM-Generated Self-Explanations论文在情感分析领域对LLM生成的自我解释进行系统分析。

1、解释然后预测 :Explanation To Prediction

下面的提示是E-P配置的一个示例。左边是SYSTEM描述,其中包含USER请求和LLM生成的ASSISTANT响应。在这个例子中,使用gpt-3.5 turbo,右边是标准模型设置。

上面的Python代码如下:

 from openai import OpenAI
 client = OpenAI()

 response = client.chat.completions.create(
   model="gpt-3.5-turbo",
   messages=[
     {
       "role": "system",
       "content": "You are a creative and intelligent movie review analyst, whose purpose is to aid in sentiment analysis of movie reviews. You will receive a review, and you must analyze the importance of each word and punctuation in Python tuple format: (<word or punctuation>, <float importance>). Each word or punctuation is separated by a space. The importance should be a decimal number to three decimal places ranging from -1 to 1, with -1 implying a negative sentiment and 1 implying a positive sentiment. Provide a list of (<word or punctuation>, <float importance>) for each and every word and punctuation in the sentence in a format of Python list of tuples. Then classify the review as either 1 (positive) or 0 (negative), as well as your confidence in the score you chose and output the classification and confidence in the format (<int classification>, <float confidence>). The confidence should be a decimal number between 0 and 1, with 0 being the lowest confidence and 1 being the highest confidence.\nIt does not matter whether or not the sentence makes sense. Do your best given the sentence.\nThe movie review will be encapsulated within <review> tags. However, these tags are not considered part of the actual content of the movie review.\nExample output:\n[(<word or punctuation>, <float importance>), (<word or punctuation>, <float importance>), ... ] (<int classification>, <float confidence>)"
     },
     {
       "role": "user",
       "content": "<review> Offers that rare combination of entertainment and education . <review>"
     },
     {
       "role": "assistant",
       "content": "[('Offers', 0.500), ('that', 0.000), ('rare', 0.300), ('combination', 0.300), ('of', 0.000), ('entertainment', 0.800), ('and', 0.000), ('education', 0.500), ('.', 0.000)] (1, 0.800)"
     }
   ],
   temperature=1,
   max_tokens=256,
   top_p=1,
   frequency_penalty=0,
   presence_penalty=0
 )
AI 代码解读

2、做出预测然后解释:Prediction To Explanation

下面的提示是P-E配置的一个示例。首先给出的是预测,然后是解释。

Python代码如下:

 from openai import OpenAI
 client = OpenAI()

 response = client.chat.completions.create(
   model="gpt-3.5-turbo",
   messages=[
     {
       "role": "system",
       "content": "You are a creative and intelligent movie review analyst, whose purpose is to aid in sentiment analysis of movie reviews. A review will be provided to you, and you must classify the review as either 1 (positive) or 0 (negative), as well as your confidence in the score you chose. The confidence should be a decimal number between 0 and 1, with 0 being the lowest confidence and 1 being the highest confidence. Output this in the Python tuple format (<int classification>, <float confidence>).\nThen, analyze how important every single word and punctuation token in the review was to your classification. The importance should be a decimal number to three decimal places ranging from -1 to 1, with -1 implying a negative sentiment and 1 implying a positive sentiment. Provide a list of (<word or punctuation>, <float importance>) for each and every word and punctuation token in the sentence in a format of Python list of tuples. Each word or punctuation is separated by a space.\nIt does not matter whether or not the sentence makes sense. Do your best given the sentence.\nThe movie review will be encapsulated within <review> tags. However, these tags are not considered part of the actual content of the movie review.\nExample output:\n(<int classification>, <float confidence>)\n[(<word or punctuation>, <float importance>), (<word or punctuation>, <float importance>), ... ]"
     },
     {
       "role": "user",
       "content": "<review> A film that takes you inside the rhythms of its subject : You experience it as you watch . <review>"
     },
     {
       "role": "assistant",
       "content": "(1, 0.8)\n[('A', 0.2), ('film', 0.3), ('that', 0.2), ('takes', 0.2), ('you', 0.4), ('inside', 0.2), ('the', 0.1), ('rhythms', 0.2), ('of', 0.2), ('its', 0.1), ('subject', 0.1), (':', 0.1), ('You', 0.4), ('experience', 0.3), ('it', 0.2), ('as', 0.2), ('watch', 0.2), ('.', 0.1)]"
     }
   ],
   temperature=1,
   max_tokens=256,
   top_p=1,
   frequency_penalty=0,
   presence_penalty=0
 )
AI 代码解读

提示输入:

 System: You are a creative and intelligent movie review analyst, whose purpose is to aid in sentiment analysis of movie reviews. A review will be provided to you, and you must classify the review as either 1 (positive) or 0 (negative), as well as your confidence in the score you chose. The confidence should be a decimal number between 0 and 1, with 0 being the lowest confidence and 1 being the highest confidence. Output this in the Python tuple format (<int classification>, <float confidence>).
 Then, analyze how important every single word and punctuation token in the review was to your classification. The importance should be a decimal number to three decimal places ranging from -1 to 1, with -1 implying a negative sentiment and 1 implying a positive sentiment. Provide a list of (<word or punctuation>, <float importance>) for each and every word and punctuation token in the sentence in a format of Python list of tuples. Each word or punctuation is separated by a space.
 It does not matter whether or not the sentence makes sense. Do your best given the sentence.
 The movie review will be encapsulated within <review> tags. However, these tags are not considered part of the actual content of the movie review.
 Example output:
 (<int classification>, <float confidence>)
 [(<word or punctuation>, <float importance>), (<word or punctuation>, <float importance>), ... ]

 User: <review> A film that takes you inside the rhythms of its subject : You experience it as you watch . <review>

 Assistant:
AI 代码解读

结果

 Understood! I'm ready to analyze the movie review you provided. Here are the results:

 (<int classification>, <float confidence>) = (0, 0.8) - Classification: Negative

 Here are the importance scores for each word and punctuation token in the review:

 [(<word or punctuation>, <float importance>), (the, -0.6), (film, 0.4), (that, -0.5), (takes, 0.3), (you, -0.2), (inside, 0.1), (the, -0.6), (rhythms, 0.3)]

 Explanation:

 The word "film" has a low importance score of -0.6 because it does not convey much sentiment towards either positive or negative emotions in the context of the review.
 The word "that" has a low importance score of -0.5 because it is a transitional word that does not significantly impact the overall sentiment of the review negatively or positively . Its presence does not change how we classify this review as entirely negative . However , its absence might affect our classification; therefore we give it some weight . This implies that if more instances like this appear throughout a text , their impact will be less than if they were part of an entirely different kind of text . For example , "the film takes you inside" would have an opposite meaning than just saying "it takes you inside".) [1][3] )] )] ) ]]. * The phrase "You experience it as you watch" has an important score of 1 because It conveys positive emotion and could be interpreted as evidence that viewers will find watching this motion picture pleasurable and engaging .[2][4]) [5][6]; hence why we rank important words higher than unimportant phrases when deciding on their overall sentimental value within given texts). As such , we consider these particular elements crucial when evaluating whether something falls into one category versus another based solely upon how others might interpret them without additional context outside those given here.] Finally – although there may be other factors at play–by focusing solely on these particular pieces here allows us to better understand what makes certain sentences/phrases stand out amongst others!
AI 代码解读

可观察性和可检查性

基于llm的应用程序的很大一部分要求是能够观察和检查生成AI应用程序的行为,这对于管理输入和输出token的成本尤其重要,为了模仿我们人类的行为,还要再加上可解释性。

而这研究发现自己处于可解释性研究中三个领域的交叉点:

例如当要求解决一个数学问题时,即使没有明确指示这样做,模型也经常包含详细的步骤。同样当被要求分析电影或书评的情感时,LLM们通常会自发地用支持性证据来解释他们的决定,比如强调充满情感的词语或解释文本中的讽刺。

下面是少样本方法的一个实际例子。几个提示符是人为生成的,用于向模型提供指令。

总结

论文研究了像ChatGPT这样的llm生成自我解释的能力,特别是在情感分析任务中,并将它们与传统的解释方法(如遮挡和LIME)进行了比较。LLM模型可以自发地为其决策生成解释,例如在情感分析任务中识别关键词。

预测的准确性因不同的自我解释方法而异。首先生Explain-then-Predict会降低性能,这表明在准确性和可解释性之间需要权衡。

没有一种解释方法在不同的度量标准中始终优于其他方法。自我解释的表现与传统方法相当,但在一致性指标方面存在显着差异。

ChatGPT的解释和预测显示了全面的值,并且对单词删除不太敏感,反映了类似人类的推理过程,但可能缺乏详细的精度。

研究结果表明需要更好的方法来引出自我解释和重新思考评估实践。与其他LLM和不同解释类型的比较研究可以提供进一步的见解。

论文地址:

https://avoid.overfit.cn/post/aff43e4336b5487fa6abd01357fc51b6

目录
打赏
0
1
2
0
529
分享
相关文章
9月大型语言模型研究论文总结
大型语言模型(llm)在今年发展迅速,随着新一代模型不断地被开发,研究人员和工程师了解最新进展变得非常重要。本文总结9-10月期间发布了一些重要的LLM论文。
115 0
NIPS 2024:代码模型自我进化超越GPT-4o蒸馏!UIUC伯克利等提出自对齐方法
在NIPS 2024上,UIUC、UC Berkeley等高校联合提出SelfCodeAlign方法,通过自我对齐使代码生成的大型语言模型(LLMs)在无需大量人工注释或蒸馏的情况下显著提升性能。该方法利用基础模型生成多样化编码任务并自我验证,最终选择通过测试的示例用于指令微调。实验表明,SelfCodeAlign微调的模型在多个编码任务上显著优于其他方法。论文地址:https://arxiv.org/pdf/2410.24198。
26 11
NeurIPS 2024:自我纠错如何使OpenAI o1推理能力大大加强?北大、MIT团队给出理论解释
在人工智能领域,大型语言模型(LLMs)的自我纠错能力正成为研究热点。北京大学和麻省理工学院的研究团队在NeurIPS 2024上发表的研究,通过基于上下文学习的理论分析,揭示了Transformer模型中关键设计在自我纠错中的作用,并提出了“Checking as Context”策略,应用于缓解社会偏见和防御LLM越狱攻击,显著提升了模型性能。然而,研究主要基于简化设置和合成数据集,存在局限性。
83 26
大模型进阶微调篇(二):基于人类反馈的强化学习RLHF原理、优点介绍,但需要警惕LLMs的拍马屁行为
本文探讨了基于人类反馈的强化学习(RLHF)方法的优缺点。作者指出,虽然RLHF能够使模型更好地满足用户需求,但也存在缺乏多样性、创新不足、偏好固化和难以适应动态变化等问题。文章通过具体实验和示例代码,详细解析了RLHF的工作原理,并强调了其在实际应用中的潜在风险。
485 6
【大语言模型-论文精读】用于医疗领域摘要任务的大型语言模型评估综述(上)
【大语言模型-论文精读】用于医疗领域摘要任务的大型语言模型评估综述(上)
88 2
【大语言模型-论文精读】用于医疗领域摘要任务的大型语言模型评估综述(下)
【大语言模型-论文精读】用于医疗领域摘要任务的大型语言模型评估综述(下)
68 1
[大语言模型-论文精读] 更大且更可指导的语言模型变得不那么可靠
[大语言模型-论文精读] 更大且更可指导的语言模型变得不那么可靠
61 0
[大语言模型-论文精读] 利用多样性进行大型语言模型预训练中重要数据的选择
[大语言模型-论文精读] 利用多样性进行大型语言模型预训练中重要数据的选择
116 0
小技巧大功效,仅阅读两次提示让循环语言模型超越Transformer++
【8月更文挑战第27天】斯坦福与布法罗大学的研究显示,通过&quot;Just-Read-Twice&quot;(JRT)策略,循环语言模型(RNNs)在多项任务上的表现超越了行业标杆Transformer++模型。JRT策略让RNNs在处理信息时进行两次读取,有效解决长上下文记忆难题,显著提升了性能。实验覆盖FDA、SQUAD等多个任务,均取得明显成效。论文已发布于arXiv。
37 2

热门文章

最新文章

AI助理

你好,我是AI助理

可以解答问题、推荐解决方案等